Explainable AI

Model explanations – Why should we care?

Utility: debugging, bias detection, recourse, if and when to trust model predictions, vet models to assess suitability for deployment. Stakeholders: end users, decision makers, regulators, researchers and engineers

Models, models, more models …

  • For a long time in the media, data, machine learning and artificial intelligence has been uncritically glorified
  • The dominant narrative is often that almost every problem can be solved with enough data
  • Serious people are making statements like “there is no point in training radiologists, because they will be replaced by AI”
  • As with other bubbles, anything that is AI raised (unhealthy) attention
  • The media has raced to announce what new problem AI has solved

… however, not every model works …

There is tremendous potential in AI, but:

  • there is a growing list of examples in which, despite initial bursts of promise, AI systems did not perform as expected
  • good results on training data did not transfer to real-world data
  • systems performed in blatantly incorrect ways, even though they seemed to work very well during training
  • at this point we could discuss various examples of spectacular failures of AI for the next two hours
  • https://incidentdatabase.ai
  • see ‘’Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy’’ by Cathy O’Neil for a very nice overview of these problems

… and sometimes it has serious consequences

  • Many AI failures are comical and entertaining, like a Roomba spreading dirt all over a room.
  • But AI systems are also increasingly being used for serious applications in the fields of biology, medicine or finance. And here subtle mistakes in operation can give rise to serious consequences.
  • In fact, every major company developing AI solutions has such failures on its conscience. The picture below is from a StatNews report on the implementation of IBM Watson for oncology. Despite enormous resources and even greater hopes, the system was not well received by doctors. Recommendations for this system were called inacurate and unsafe.

Read more: https://www.statnews.com/

This should not happen

  • How do we know what the model has learned? Maybe it bases decisions on some strange artifact?
  • This is not a made up possibility, in the example below the model’s decisions correlated strongly with the fact that there were captions in the lower left corner.
  • It turns out that in the learning data there was often a description in the lower left corner next to the horse pictures. Instead of learning to recognize the characteristics of horses, it is much easier to recognize the presence of text in the lower left corner.

Responsible and ethical AI - the business response

  • On the websites of many companies dealing with AI-related products and services, you can find bookmarks with the topic of “Trustworthy AI”.
  • On the slide we have a few sites of companies producing software for (Auto)ML, namely H2O, we have consulting companies such as McKinsey, PWC, IBM, as well as product companies such as Samsung and Google.
  • Many companies are outdoing themselves in presenting their principals which include slogans such as Transparency, Fairness, Explainability. How can these slogans be realized?

The right to an explanation in Europe

From Recital 71 EU GDPR

,,(71) The data subject should have the right not to be subject to a decision, which may include a measure, evaluating personal aspects relating to him or her which is based solely on automated processing and which produces legal effects concerning him or her or similarly significantly affects him or her, such as automatic refusal of an online credit application or e-recruiting practices without any human intervention.

Such processing includes ‘profiling’ that consists of any form of automated processing of personal data evaluating the personal aspects relating to a natural person, in particular to analyse or predict aspects concerning the data subject’s performance at work, economic situation, health, personal preferences or interests, reliability or behaviour, location or movements, where it produces legal effects concerning him or her or similarly significantly affects him or her.

However, decision-making based on such processing, including profiling, should be allowed where expressly authorised by Union or Member State law to which the controller is subject, including for fraud and tax-evasion monitoring and prevention purposes conducted in accordance with the regulations, standards and recommendations of Union institutions or national oversight bodies and to ensure the security and reliability of a service provided by the controller, or necessary for the entering or performance of a contract between the data subject and a controller, or when the data subject has given his or her explicit consent.

In any case, such processing should be subject to suitable safeguards, which should include specific information to the data subject and the right to obtain human intervention, to express his or her point of view, to obtain an explanation of the decision reached after such assessment and to challenge the decision.’’

How to think about the explainability of predictive models

When we think about the interpretability of models we usually distinguish three classes of methods

  • Interpretable by design, i.e. methods whose structure allows us to directly analyze how the prediction was formed. For different classes of models, explanations may look different, but they are directly based on model parameters. For linear models they are coefficients, for k-neighbors they are neighbors, for naive Bayes they are marginal distributions
  • model specific, i.e. methods whose structure is complex but can be summarized or represented to better understand the relationship between input and output. The two most common classes of models with model-specific explanations are tree model committees (here we can summarize the tree structure) and neural networks (here we can usually summarize the flow of the signal through the network)
  • model agnostic, i.e. methods to which this course is devoted, methods that assume nothing about the structure of the model and can be used for models with different structures. Moreover, they can be used to compare models with different structures.

The pyramid of explainability

  • In this class we will discuss several techniques for global and local analysis of the model.
  • Global analysis is concerned with the behavior of the model on the entire data
  • Local analysis deals with the model’s behavior on one/some observations

Shift in our focus: Statistics

  • Statistical analysis of data most often assumes a great deal of knowledge about the phenomenon. Understanding the data allows to choose appropriate transformations, representations. Verification is oriented toward hypothesis testing, such as by p-values

Shift in our focus: Machine Learning

  • Machine learning puts a priority on optimizing the model, especially for performance. There is a lot of searching through the space of possible solutions here to find the best one
  • Knowledge of the phenomenon is no longer so important

Properties of interpretable models

  • Transparency - the ability to understand the model’s behavior
  • Simplicity - the model should be simple enough to be understood by a human
  • Accuracy - the model should be accurate enough to be useful
  • Consistency - the model should be consistent with the domain knowledge
  • Stability - the model should be stable, i.e. small changes in the data should not lead to large changes in the model
  • Fairness - the model should be fair, i.e. it should not discriminate against any group

Transparency: Simulatability

  • Can a person contemplate the entire model at once?
  • Note that this constrains us to a very simple model!
  • Can a person take the input data and model parameters and calculate the prediction?
  • This is a very strong requirement, but it is often used as a benchmark for the interpretability of the model

Transparency: Decomposability

  • Can the model be decomposed into smaller parts?
  • Even if the model is too complex to be understood in its entirety, can we understand the behavior of individual parts?
  • For example in the case of a random forest, we can understand the behavior of individual trees
  • Inputs must be interpretable, i.e. we must understand what the input means - otherwise we can understand the mathematical function but not the interpretation of the model

Transparency: Algorithmic transparency

  • Can we understand how and what the model has learned?
  • Possible in the case of linear models, decision trees, etc.
  • Modern deep neural networks are often criticized for their lack of algorithmic transparency
  • Worth noting: humans lack all of these types of transparency!

Post-Hoc Explanations: Text

  • Humans often justify decisions verbally (even if the decision was made unconsciously)
  • Post-hoc explanations are explanations that are generated after the decision has been made
  • For some models, we can develop post-hoc explanations that are similar to human explanations
  • For example, we can generate a list of words that were most important in the decision
  • In other cases, explanations are trained to maximize likelihood of ground truth explanations from humans
  • This means that the explanations don’t necessarily describe the model’s process, but human intuition

Post-Hoc Explanations: Visual

  • Humans are very good at understanding visual information
  • For example, we can generate heatmaps that show which parts of the image were most important for the decision
  • In CNNs, this can be done using backpropagation to the input
  • https://poloclub.github.io/cnn-explainer/
  • For other models, high dimensional decision functions can be visualized using dimensionality reduction techniques

Post-Hoc Explanations: Examples

  • Reasoning by showing examples is a powerful way to explain decisions
  • e.g. Patient A has a tumour because they have similar attributes to these K patients who were diagnosed with a tumour
  • This is the basis of many recommendation systems (“shoppers like you bought” is already more convincing than “the algorithm says you are likely to buy”)
  • This is also the basis of many medical diagnosis systems. The user can inspect the diagnosis by looking at similar cases highlighted by the system

Post-Hoc Explanations: Local Explanations

  • While we can’t usually explain a complex model in its entirety, we can often explain behaviour in a specific region
  • This is typically expressed as the difference between ‘global’ and ‘local’ explanations
  • For example, only looking at patients under 30 with high blood pressure
  • We will get to some specific methods for local explanations later

Are there any situations where interpretability is not important (or even bad)?

  • Low-stakes decisions - if the decision is not important, it may not be worth the effort to interpret the model
  • High-dimensional data - in high-dimensional data, it may be impossible to interpret the model efficiently
  • Proprietary or sensitive models - if the model is proprietary or sensitive, it may be dangerous to reveal the model’s internals
  • Performance-critical applications - if the model is performance-critical, it may be better to use a more complex model

A Unified Approach to Interpreting Model Predictions

  • Next, we will dive into some specific and popular methods for explaining models.
  • These will be primarily post-hoc methods, and ones that can be applied to a wide array of methods (model-agnostic).
  • The first method we will discuss is SHAP, which is a unified approach to interpreting model predictions.
  • A Unified Approach to Interpreting Model Predictions

Why SHAP?

  • Shapley values are currently the most popular technique for model explanations (almost in each category: local, global, model agnostic, model specific…)
  • if you remember only one method after this course, let it be the SHAP
  • In addition to the basic SHAP method, there are many extensions like ShapleyFlow or ASV
  • Figure below is from the paper Explainable AI Methods - A Brief Overview

XAI pyramid

  • This is one of the three fundamental methods of explaining the behaviour of predictive models.
  • SHAP corresponds to panel C. We try to explain the behaviour of the model by decomposing the distance between this particular prediction and the average prediction of the model.

Game Theory Primer: Shapley Values

Notation

  • We have set of \(P = \{1, ..., p\}\) players
  • For each coalition, i.e. subset \(S \subseteq P\) we can calculate the payout \(v(S)\) and \(v(\{\varnothing\}) = 0\)
  • We want to fairly distribute the payout \(v(P)\)
  • Optimal attribution for player \(i\in P\) will be denoted as \(\phi_i\)

Motivational example 1

Students A, B and C carry out a project together. With this payoff table, determine what portion of the award each student should get.

Motivational example 1

Students A, B and C carry out a project together. With this payoff table, determine what portion of the award each student should get.

Motivational example 2

Students A, B and C carry out a project together. With this payoff table, determine what portion of the award each student should get.

Motivational example 2

Students A, B and C carry out a project together. With this payoff table, determine what portion of the award each student should get.

Shapley values (via permutations)

  • Fair reward sharing strategy for player \(j\in P\) will be denoted as \(\phi_j\). Surprise, these are Shapley values.
  • Note that added value of player \(j\) to coalition \(S\) is \(v(S \cup \{j\}) - v(S)\)
  • Shapley values are defined as

\[ \phi_j = \frac{1}{|P|!} \sum_{\pi \in \Pi} (v(S_j^\pi \cup \{j\}) - v(S_j^\pi)) \]

where \(\Pi\) is a set of all possible permutations of players \(P\) while \(S_j^\pi\) is a set of players that are before player \(j\) in permutation \(\pi\).

  • Instead of trying all \(\Pi\) permutations one can use only \(B\) random permutations to estimate \(\phi_j\)

\[ \hat\phi_j = \frac{1}{|B|} \sum_{\pi \in B} (v(S_j^\pi \cup \{j\}) - v(S_j^\pi)) \]

Shapley values for Machine Learning Models

Definitions

  • Let’s start with local explanations, focused on single point \(x\) and the model prediction \(f(x)\).

  • Now instead of players, you can think about variables. We will distribute a reward between variables to recognize their contribution to the model prediction \(f(x)\).

How to understand the value function

  • Let’s take a look at how the value function works for a set S of players using the Titanic data example and an explanation for the observations age=8, class=1st, fare=72, ….
  • Let’s consider the process of conditioning the distribution of data on consecutive variables. In the figure below, we start the prediction distribution for all data, this corresponds to a coalition without players.
  • Then we add the player age, which means conditioning the data with the condition age=8.
  • Next, we add the class variable to the coalition, which means further conditioning the data with the condition class=1st. In the next step, we add fare to the coalition, and so on.
  • In the last step, once all the players are in the coalition, that is, all the variables, the model’s predictions will reduce to a single point \(f(x)\)

  • In fact, we are not interested in the distributions of conditional predictions, only in the expected value of these distributions. This is what our value function is.

  • The added value of variable \(j\) when added to the coalition \(S\) is the change in expected value. In the example below, adding the class variable to a coalition with the age variable increases the reward by \(0.086\).

Average of conditional contributions

  • The Shapley value is the average after all (or a large number) of the orders in which variables are added to the coalition.
  • For diagnostic purposes, on graphs, we can also highlight the distribution of added values for different coalitions to get information on how much the effect of a given variable is additive, i.e. leads to the same added value regardless of the previous composition of the coalition.

  • Order matters. For a model that allows interactions, it is easy to find an example of a non-additive effect of a variable. How to explain the different effects of the age variable in the figure below?

From local to global – Feature importance

  • The SHAP method gives local explanations, i.e. explanations for each single observation. But we can convert them to global explanations by aggregating the explanations for individual observations.
  • For example, we can assess the validity of a variable by counting the average modulus of SHAP explanations.
  • Such a measure of the importance of variables does not depend on the model structure and can be used to compare models.
  • Below is an example for the model trained for Titanic data

From local to global – Summary plot

  • One of the most useful statistics is a plot summarizing the distribution of Shapley values for the data for each variable.
  • On the X axis are presented the Shapley values, in the rows are the variables. The color indicates whether an observation had a high or low value in that variable.
  • From the graph you can read which variables are important (they have a large spread of points)
  • You can read what is the relationship between the variable and the Shapley value, whether the color has a monotonic gradation or there are some dependencies
  • You can read the distribution of Shapley values

From local to global – Dependence plot

  • If we plot the Shapley values as functions of the value of the original variable, it is possible to see what kind of relationship exists between this variable and the average result.
  • This type of plots allows you to choose the transformations of the variable, and better understand the relationship between this variable and the result of the model

  • We can additionally color the graph depending on one more variable (in the example below, it is gender) to see if an interaction is present in the model. In this case, the attributes of the model will depend on the value of this additional variable.

Break & Mini-Lab 1

“Why Should I Trust You?”: Explaining the Predictions of Any Classifier

XAI pyramid

  • Thinking about the XAI pyramid, we are still in the same group of solutions as SHAP, i.e. local explanations focused on the importance of features
  • As with SHAP, local LIME explanations can be used to explain the global model

XAI pyramid

  • LIME is based on one of the three fundamental approaches to explanation of predictive models.
  • LIME corresponds to panel B – approximation with linear surrogate model to get some understanding about black-box model behavior around \(x\)

Start with Why

Desired characteristics of explanations (from LIME paper)

  • Explanations should be easy to undestand = interpretable (simple, sparse, based on interpretable features) for a user
  • Good explanation should be model-agnostic, i.e. does not depend on model structure. This will help to compare explanations for different models
  • Local fidelity of explanations


Explanation process. Figure from LIME paper

Core idea

The core ideas behind LIME are:

  • Input to the model will be transformed into an interpretable feature space
  • Local model behaviour will be explained by approximating it by an interpretable surrogate model (e.g. a shallow tree or a linear regression model)
  • Local approximation is trained on artificial points generated from the neighborhood of the observation of interest \(x\)


Figure from EMA book

Fidelity-Interpretability Trade-off

The explanation will be a model \(g\) that approximates the behavior of the complex model \(f\) and is as simple as possible

\[ \hat g = \arg \min_{g \in G} L\{f, g, \pi(x)\} + \Omega(g) \]

where

  • \(f()\) is a model to be explained
  • \(x\) is an observation of interest
  • \(G\) is a class of interpretable models
  • \(\hat g\) is an explanation, a model from class \(G\)
  • \(\Omega(g)\) is a penalty function that measures complexity of models from \(G\).
  • \(L()\) a function measuring the discrepancy between models \(f\) and \(g\) in the neighborhood \(\pi(x^*)\)

LIME Algorithm

Explanations can be calculated with a following instructions.

  1. Let \(x'\) = \(h\)(x) be a version of \(x\) in the interpretable data space
  2. for i in 1…N {
  3.       z’[i] = sample_around(x’)
  4.       y’[i] = \(f\)(z’[i])
  5.       w’[i] = similarity(x’, z’[i])
  6. }
  7. return K-LASSO(y’, x’, w’)

where

  • \(x\) – an observation to be explained
  • \(N\) – sample size needed to fit a glass-box model
  • \(K\) – complexity, the maximum number of variables in the glass-box model
  • similarity – a distance function in the original data space
  • K-LASSO – a weighted LASSO linear-regression model that selects K variables
  • w’ – weights that measure of the similarity between original observation \(x\) and new artificially generated observations.

Example: Duck or horse? 1/4

Let’s see how LIME can be used to solve this problem.

Initial settings

  • Let’s consider a VGG16 neural network trained on the ImageNet data
  • Input size are images 244 \(\times\) 244 pixels. We have 1000 potential categories for the training data
  • The input space is of dimension 3 \(\times\) 244 \(\times\) 244, i.e. it is a 178 608-dimensional space
  • We need to translate the input to the interpretable data space, here image will be transformed into superpixels, which are treated as binary features (see an example later)
  • In this example \(f()\) operates on space with \(178 608\) dimensions, while the glass-box model \(g()\) operates on a binary space with \(100\) dimensions

Example: Duck or horse? 2/4

Interpretable data space

  • Interpretable data space is a binary space that encodes presence or absence of selected features
  • The interpretable space can be constructed globally (e.g. for tabular data) or locally (e.g. for images)
  • For image data, the most common approach constructs an interpretable data space for each observation separately by using a segmentation algorithm.
  • The result is the division of the input image into a certain number of regions/called superpixels

Example: Duck or horse? 3/4

Sampling around x

  • We sample around the observation x’ in the interpretable space
  • It’s a binary space in which an observation \(x\) is represented by a vector of ones
  • Sampling corresponds to randomly selecting coordinates that will be flipped to zero
  • We need N such new observations

Example: Duck or horse? 4/4

Fitting of an interpretable model

  • For new data, we make predictions with model \(f()\)
  • And then for the observations in the interpretable representation we train a K-LASSO model which will have \(K\) non-zero coefficients
  • We can use the \(R^2\) coefficient to assess the quality of fit of the model \(g()\)

Interpretable data representations

How to transform the input data into a binary vector of shorter length?

  • For image data interpretable feature space is commonly based on superpixels, i.e. through image segmentation
  • For text data, words or groups of words are frequently used as interpretable variables
  • For tabular data, continuous variables are often discretized to obtain interpretable binary variables.


Example from LIME github

Model debugging 1/3

  • There are many reasons to know and develop XAI techniques
  • One of them is the ability to debug the model
  • The most well-known example is improving the performance of a network that misclassified the following image
  • How LIME can help here?


Figure from presentation about LIME by Sameer Singh

Model debugging 2/3

  • The model works very well. Classification between husky of wolf in accurate in almost every image except one. Why?


Figure from presentation about LIME by Sameer Singh

Model debugging 3/3

  • Can LIME’s explanation help us find the source of the problem?
  • It turns out that in the case of classification as a wolf, the important feature is the snow in the background
  • Effectively, the model has learned to recognize snow in the background and so classifies as a wolf class
  • This is not a feature that people use for classification wolf/husky. But would you sacrifice the quality of the model to remove the dependence on using the background for classification?


Figure from presentation about LIME by Sameer Singh

  • Training that cancelled the model’s dependency on the snow feature improved the accuracy of the model

Explaining through examples

The LIME method was designed to explain the model’s behavior locally, around the observation of interest. But we are often interested in knowing or at least getting an intuition about how the model works globally.

Assuming the user has time to look at LIME explanations for B observations, the question is how to select them.

Submodular pick (SP) algorithm

The LIME paper presents a user-study example where the submodular picks method most effectively convinces the user how the model works.

Can non-experts improve a classifier?

  • The LIME paper describes the results of several experiments involving humans subjects
  • Very interesting results involved using explanations to improve the model, even if the improvement is generated by the knowledge and actions of non-ML-experts
  • The experiment was based on a model for a classification task based on text data
  • The explanations of the model generated by the LIME method were then shown to the participants of the experiment. That is, for each observation, the relevant words were highlighted
  • Participants could determine that some of these words were artifacts and should not be used by the model
  • The model was then trained again on the remaining features, with the artifacts removed
  • It turns out that such feature engineering using experts led to better results after several rounds


Figure from the LIME paper

Break & Mini-Lab 2

Visualizing the effects of predictor variables

XAI pyramid

  • Thinking about the XAI pyramid, we will go to the third level of the pyramid, level related to profile explanations.
  • We will focus on explanations for a single variable but all presented methods may be extended to two or more variables.

What we are going to explain

  • CP and PD are based on one of the three fundamental approaches to explanation of predictive models.
  • Methods that will be discussed today correspond to panel A – tracing model response along changes in a single variable to get some understanding about black-box model behavior around \(x\)

What-if?

  • when explaining anything, one of the most natural questions is: ‘’what would happen if the input changes’’.
  • Note that neither LIME nor SHAP directly answer this question. They indicate an importance of some variables, but there is no answer of what would happen if a variable increases/decreases.
  • For high-dimensional models we are not able to keep track of all possible changes, but we can look towards one or two variables.
  • Here is an example of such local explanations. Continuous variable on the left. Categorical variable on the right.

Ceteris Paribus

Ceteris Paribus in action

  • Ceteris paribus is a Latin phrase, meaning ‘’all other things being equal’’ or ‘’all else unchanged’’, see Wikipedia.

  • It is a function defined for model \(f\), observation \(x\), and variable \(j\) as:

\[\begin{equation} h^{f}_{x,j}(z) = f\left(x_{j|=z}\right), \end{equation}\]

where \(x_{j|=z}\) stands for observation \(x\) with \(j\)-th coordinate replaced by value \(z\).

  • The Ceteris Paribus profile is a function that describes how the model response would change if \(j\)-th variable will be changed to \(z\) while values of all other variables are kept fixed at the values specified by \(x\).

  • In the implementation we cannot check all possible zs, we have to meaningfully select a subset of them.

  • Note that CP profiles are also commonly referred as Individual Conditional Expectation (ICE) profiles.

Ceteris Paribus - Many variables

  • No one wants to check CP profiles for all possible variables (of which there can be hundreds) but only for those in which ‘something is happening’. These variables can be identified based on amplitude of oscillations.

Ceteris Paribus - Many models

  • CP profile is a convenient tool for comparing models.
  • It can be particularly useful for comparing models from different families, e.g. tree vs. linear; flexible vs. rigid.

Partial Dependence

Partial Dependence - intutition

  • As with other explanations, we can aggregate local explanations to get a global view of how the model works.
  • Let’s average Ceteris Paribus profiles.

Partial Dependence in action

  • Introduced in 2001 in the paper Greedy Function Approximation: A Gradient Boosting Machine. Jerome Friedman. The Annals of Statistics 2001
  • Ceteris Paribus averaged profile following marginal distribution of variables \(X^{-j}\).

\[ g^{PD}_{j}(z) = E_{X_{-j}} f(X_{j|=z}) . \]

  • The estimation is based on the average of the CP profiles.
  • The computational complexity is \(N \times Z\) model evaluations, where \(N\) is the number of observations and \(Z\) is the number of points at which the CP profile is calculated.

\[ \hat g^{PD}_{j}(z) = \frac{1}{n} \sum_{i=1}^{n} f(x^i_{j|=z}). \]

Ceteris Paribus and Partial Dependence - Pros and Cons

Pros

  • Easy to communicate, and extendable approach to model exploration.
  • Graphical representation is easy to understand and explain.
  • CP profiles are easy to compare, as we can overlay profiles for two or more models to better understand differences between the models.

Cons

  • May lead to out-of-distribution problems if correlated explanatory variables are present. In this case application of the ceteris-paribus principle may lead to unrealistic settings and misleading results.
  • Think about prediction of an apartment’s price and correlated variables like no. rooms and surface area. You cannot change no. rooms freely keeping the surface constant.
  • Will not explain high-order interactions. Pairwise interactions require the use of two-dimensional CP and so on.
  • For models with hundreds or thousands of variables, the number of plots to inspect grow with number of variables.

Summary

  • Today we have discussed the three fundamental methods of explaining the behavior of predictive models.

  • SHAP is a method that allows us to explain the behavior of the model by decomposing the distance between this particular prediction and the average prediction of the model.

  • LIME is a method that allows us to explain the behavior of the model by approximating it with a linear surrogate model.

  • CP and PD are methods that allow us to explain the behavior of the model by tracing the model response along changes in a single variable.

  • All these methods can be used to explain the behavior of the model globally, but also to compare models with different structures.

  • The choice of the method depends on the problem we are dealing with, the structure of the model, and the preferences of the user.

Break & Mini-Lab 3